194 research outputs found

    Representative Proteomes: A Stable, Scalable and Unbiased Proteome Set for Sequence Analysis and Functional Annotation

    Get PDF
    The accelerating growth in the number of protein sequences taxes both the computational and manual resources needed to analyze them. One approach to dealing with this problem is to minimize the number of proteins subjected to such analysis in a way that minimizes loss of information. To this end we have developed a set of Representative Proteomes (RPs), each selected from a Representative Proteome Group (RPG) containing similar proteomes calculated based on co-membership in UniRef50 clusters. A Representative Proteome is the proteome that can best represent all the proteomes in its group in terms of the majority of the sequence space and information. RPs at 75%, 55%, 35% and 15% co-membership threshold (CMT) are provided to allow users to decrease or increase the granularity of the sequence space based on their requirements. We find that a CMT of 55% (RP55) most closely follows standard taxonomic classifications. Further analysis of this set reveals that sequence space is reduced by more than 80% relative to UniProtKB, while retaining both sequence diversity (over 95% of InterPro domains) and annotation information (93% of experimentally characterized proteins). All sets can be browsed and are available for sequence similarity searches and download at http://www.proteininformationresource.org/rps, while the set of 637 RPs determined using a 55% CMT are also available for text searches. Potential applications include sequence similarity searches, protein classification and targeted protein annotation and characterization

    Species-level functional profiling of metagenomes and metatranscriptomes.

    Get PDF
    Functional profiles of microbial communities are typically generated using comprehensive metagenomic or metatranscriptomic sequence read searches, which are time-consuming, prone to spurious mapping, and often limited to community-level quantification. We developed HUMAnN2, a tiered search strategy that enables fast, accurate, and species-resolved functional profiling of host-associated and environmental communities. HUMAnN2 identifies a community's known species, aligns reads to their pangenomes, performs translated search on unclassified reads, and finally quantifies gene families and pathways. Relative to pure translated search, HUMAnN2 is faster and produces more accurate gene family profiles. We applied HUMAnN2 to study clinal variation in marine metabolism, ecological contribution patterns among human microbiome pathways, variation in species' genomic versus transcriptional contributions, and strain profiling. Further, we introduce 'contributional diversity' to explain patterns of ecological assembly across different microbial community types

    FastBLAST: Homology Relationships for Millions of Proteins

    Get PDF
    BackgroundAll-versus-all BLAST, which searches for homologous pairs of sequences in a database of proteins, is used to identify potential orthologs, to find new protein families, and to provide rapid access to these homology relationships. As DNA sequencing accelerates and data sets grow, all-versus-all BLAST has become computationally demanding.Methodology/principal findingsWe present FastBLAST, a heuristic replacement for all-versus-all BLAST that relies on alignments of proteins to known families, obtained from tools such as PSI-BLAST and HMMer. FastBLAST avoids most of the work of all-versus-all BLAST by taking advantage of these alignments and by clustering similar sequences. FastBLAST runs in two stages: the first stage identifies additional families and aligns them, and the second stage quickly identifies the homologs of a query sequence, based on the alignments of the families, before generating pairwise alignments. On 6.53 million proteins from the non-redundant Genbank database ("NR"), FastBLAST identifies new families 25 times faster than all-versus-all BLAST. Once the first stage is completed, FastBLAST identifies homologs for the average query in less than 5 seconds (8.6 times faster than BLAST) and gives nearly identical results. For hits above 70 bits, FastBLAST identifies 98% of the top 3,250 hits per query.Conclusions/significanceFastBLAST enables research groups that do not have supercomputers to analyze large protein sequence data sets. FastBLAST is open source software and is available at http://microbesonline.org/fastblast

    Novel insights into the insect trancriptome response to a natural DNA virus

    Get PDF
    ArticleCopyright © 2015 McTaggart et al.; licensee BioMed Central.Background Little is known about invertebrate responses to DNA viruses. Here, we infect a commercially important pest moth species Plodia interpunctella with its naturally infecting DNA virus. We sequenced, assembled and annotated the complete transcriptome of the moth, and a partial transcriptome of the virus. We then tested for differential gene expression between moths that were exposed to the virus and controls. Results We found 51 genes that were differentially expressed in moths exposed to a DNA baculovirus compared to controls. Gene set enrichment analysis revealed that cuticle proteins were significantly overrepresented in this group of genes. Interestingly, 6 of the 7 differentially expressed cuticle proteins were downregulated, suggesting that baculoviruses are able to manipulate its host’s response. In fact, an additional 29 of the 51 genes were also downregulated in exposed compared with control animals, including a gram-negative binding protein. In contrast, genes involved in transposable element movement were upregulated after infection. Conclusions We present the first experiment to measure genome-wide gene expression in an insect after infection with a natural DNA virus. Our results indicate that cuticle proteins might be key genes underpinning the response to DNA viruses. Furthermore, the large proportion of genes that were downregulated after viral exposure suggests that this virus is actively manipulating the insect immune response. Finally, it appears that transposable element activity might increase during viral invasion. Combined, these results provide much needed host candidate genes that respond to DNA viral invaders.NERC Biomolecular Analysis Facility (NBAF

    TMFoldRec: a statistical potential-based transmembrane protein fold recognition tool.

    Get PDF
    BACKGROUND: Transmembrane proteins (TMPs) are the key components of signal transduction, cell-cell adhesion and energy and material transport into and out from the cells. For the deep understanding of these processes, structure determination of transmembrane proteins is indispensable. However, due to technical difficulties, only a few transmembrane protein structures have been determined experimentally. Large-scale genomic sequencing provides increasing amounts of sequence information on the proteins and whole proteomes of living organisms resulting in the challenge of bioinformatics; how the structural information should be gained from a sequence. RESULTS: Here, we present a novel method, TMFoldRec, for fold prediction of membrane segments in transmembrane proteins. TMFoldRec based on statistical potentials was tested on a benchmark set containing 124 TMP chains from the PDBTM database. Using a 10-fold jackknife method, the native folds were correctly identified in 77 % of the cases. This accuracy overcomes the state-of-the-art methods. In addition, a key feature of TMFoldRec algorithm is the ability to estimate the reliability of the prediction and to decide with an accuracy of 70 %, whether the obtained, lowest energy structure is the native one. CONCLUSION: These results imply that the membrane embedded parts of TMPs dictate the TM structures rather than the soluble parts. Moreover, predictions with reliability scores make in this way our algorithm applicable for proteome-wide analyses. AVAILABILITY: The program is available upon request for academic use
    corecore